Universal Dependency Annotation for Multilingual Parsing
نویسندگان
چکیده
We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than has been possible before. This ‘universal’ treebank is made freely available in order to facilitate research on multilingual dependency parsing.1
منابع مشابه
Universal Dependencies for Dargwa Mehweb
The Universal Dependencies (UD) project aims to create the unified annotation schemes across languages. With its own annotation principles and abstract inventory for parts of speech, morphosyntactic features and dependency relations, UD aims to facilitate multilingual parser development, crosslingual learning, and parsing research from a language typology perspective. This paper provides the de...
متن کاملUniversal Dependencies: A Cross-Linguistic Perspective on Grammar and Lexicon
Universal Dependencies is an initiative to develop cross-linguistically consistent grammatical annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning and parsing research from a language typology perspective. It assumes a dependency-based approach to syntax and a lexicalist approach to morphology, which together entail that the funda...
متن کاملConversion from Paninian Karakas to Universal Dependencies for Hindi Dependency Treebank
Universal Dependencies (UD) are gaining much attention of late for systematic evaluation of cross-lingual techniques for crosslingual dependency parsing. In this paper we present our work in line with UD. Our contribution to this is manifold. We extend UD to Indian languages through conversion of Pānịnian Dependencies to UD for the Hindi Dependency Treebank (HDTB). We discuss the differences in...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملUniversal dependencies for Uyghur
The Universal Dependencies (UD) Project seeks to build a cross-lingual studies of treebanks, linguistic structures and parsing. Its goal is to create a set of multilingual harmonized treebanks that are designed according to a universal annotation scheme. In this paper, we report on the conversion of the Uyghur dependency treebank to a UD version of the treebank which we term the Uyghur Universa...
متن کامل